Report for PAKDD 2006 Data Mining Competition

نویسنده

  • Jiang Su
چکیده

The task requires that a classifier model can give high prediction accuracy over 3G customer, while keeping reasonable misclassification rate for 2G customer. In this report, we firstly learn a classification model with high AUC(The area under the ROC curve) based on our latest research result in classification algorithm, and then find a optimal decision threshold from the resulting ROC(Received Operating Characteristic) curve plot. At last, the human understandable knowledge learned from this real-world problem are also presented. 1 Understanding of the problem The PAKDD 2006 Data Mining Competition offered a excellent opportunity for the application of data mining about identifying potential telecom customer. In this competition, the task is to learn a classification model that can predict potential 3G customer as much as possible. The original training data set consisted of 20,000 2G network customers and 4,000 3G network customers, which has been provided with more than 200 data fields. A 3G customer is defined as a customer who has a 3G Subscriber Identity Module (SIM) card and is currently using a 3G network compatible mobile phone. Based on the above problem description, we may need to deal with the following three challenging problems. First, a 2G customer may transfer to 3G customer at any time, or a 3G customer may be a 2G customer before. Thus, it is not surprised that we may encounter the same customer with different class label. Second, rather than giving a properly defined misclassification cost function, a subtle true positive rate, identifying 3G customer as much as possible, is needed. Since for any classifier, there is always trade off between True Positive rate and True Negative rate, where true class refers to 3G customer and negative class refers to 3G customer. Thus, the underlying requirement is to maximize the True Positive rate, while keeping the reasonable the True Negative rate. Third, human understandable knowledge should be presented, which are useful for making business decision, such as different promotion strategies for vary customer groups.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Solution to PAKDD’07 Data Mining Competition

This article presents a solution to the PAKDD’07 Data Mining competition. We mainly discuss the main challenge to this problem and our way to solve it.

متن کامل

A Solution to the Cross-Selling Problem of PAKDD-2007

Our team has won the Grand Champion (Tie) of PAKDD-2007 data mining competition. The data mining task is to score credit card customers of a consumer finance company according to the likelihood that customers take up the home loans offered by the company. This report presents our solution for this business problem. TreeNet and logistic regression are the data mining algorithms used in this proj...

متن کامل

Mobile Phone Customer Type Discrimination via Stochastic Gradient Boosting

Mobile phone customers face many choices regarding handset hardware, add-on services, and features to subscribe to from their service providers. Mobile phone companies are now increasingly interested in the drivers of migration to third generation (3G) hardware and services. Using real world data provided to the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2006 Da...

متن کامل

Overview of PAKDD Competition 2007

The PAKDD Competition 2007 involved the problem of predicting customers' propensity to take up a home loan when a collection of data from credit card users are provided. It is rather difficult to address the problem because 1) the data set is extremely imbalanced; 2) the features are mixture types; 3) there are many missing values. This paper gives an overview on the competition, mainly consist...

متن کامل

Producing Scores for Customers via Ensembling SVM

Supervised by Dr. Yilong Yin Email:[email protected] School of Computer Science and Technology, Shandong University Jinan 250100, China Abstract This report shows our solution to PAKDD Competition 2007. Following a brief description of the data mining task, we discuss four difficulties to be dealt with in this task. Then, we show how to do the data pre-processing. To weaken class-imbalance of th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006